Algorithmic information, complexity and Zipf's law

نویسندگان

  • V. K. Balasubrahmanyan
  • S. Naranan
چکیده

Zipf’s law of word frequencies for language discourses is established with statistical rigor. Data show a departure from Zipf’s power law term at low frequencies. This is accounted by a modifying exponential term. Both arise naturally in a model for word frequencies based on Information Theory, algorithmic coding of a text preserving the symbol sequence, concepts from quantum statistical physics and computer science and extremum principles. The Optimum Meaning Preserving Code (OMPC) of the discourse is realized when word frequencies follow the Modified Power Law (MPL). The model predicts a variant of the MPL for the relative frequencies of a small fixed set of symbols such as letters, phonemes and grammatical words. The OMPC can be viewed as containing orderly and random parts. This leads us to a quantitative definition of complexity of a string (C) that tends to 0 for the extremes of ‘all order’ and ‘all random’ but is a maximum (C = 1) for a mixture of both (Gell-Mann). It is found that natural languages have maximum complexity. The uniqueness of Zipf’s power law index (γ = 2) is shown to arise in four different ways, one of which depends on scale invariance characteristic of fractal structures. It is argued that random text models are unsuitable for natural languages. It is speculated that a drastic change in symbol frequency distribution starting from phrases is related to emergence of meaning and coherence of a discourse.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Universality of Zipf's law.

Zipf's law is the most common statistical distribution displaying scaling behavior. Cities, populations or firms are just examples of this seemingly universal law. Although many different models have been proposed, no general theoretical explanation has been shown to exist for its universality. Here, we show that Zipf's law is, in fact, an inevitable outcome of a very general class of stochasti...

متن کامل

Comments on "linguistic features in eukaryotic genomes"

Tsonis and Tsonis [1] study rank-ordered distributions of the number of occurrences of protein domains in four different organisms, and they argue that the power-law decay, f ϰ 1/r, of the number f of occurrences of a protein domain with its rank r suggests the presence of linguistic features in eukaryotic genomes, and that this finding " may lead to important clues about the evolution of langu...

متن کامل

MICA: A Hybrid Method for Corpus-Based Algorithmic Composition of Music Based on Genetic Algorithms, Zipf's Law, and Markov Models

An algorithm known as the Musical Imitation and Creativity Algorithm (MICA) that composes stylistic music based on a corpus of works in a given style is presented. The corpus works are digital music scores created from the widely available MIDI format. The algorithm restricts the note placement in compositions using a Markov chain model built from discrete-time representations of the corpus pie...

متن کامل

Random texts exhibit Zipf's-law-like word frequency distribution

It is shown that the distribution of word frequencies for randomly generated texts is very similar to Zipf's law observed in natural languages such as the English. The facts that the frequency of occurrence of a word is almost an inverse power law function of its rank and the exponent of this inverse power law is very close to 1 are largely due to the transformation from the word's length to it...

متن کامل

Comments to "Bell Curves and Monkey Languages", J. Casti, Complexity, 1, 12-15 1995.

Whether there are universal laws or principles in complex systems is a fascinating and important question. Prof. John Casti uses the case of Normal Distribution (\bell curves") to illustrate that such universal principle is perhaps out there waiting to be discovered [1]. He suggests Zipf's law as a candidate for such universal principle. But as the author of one of the three publications to pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Glottometrics

دوره 4  شماره 

صفحات  -

تاریخ انتشار 2002